A Streaming Supercomputer

نویسنده

Bill Dally

چکیده

We are in an era where computational building blocks are plentiful and inexpensive. A single chip today can hold over 100 1GHz °oating-point units for a total performance of 100 GFLOPS/chip. Many graphics chips achieve 80GFLOPS and over 1TOP rendering performance, and cost less than $100. Embedded processors are less powerful, but incredibly cheap. It is fair to say that a raw GFLOPS costs less than $1. Memory is currently selling for less than 20 cents a MByte. Bandwidth has become less expensive as well. Chips with a Tb/s of aggregate bandwidth have recently been demonstrated. In this era of plenty, however, we have not developed technology to cost e®ectively scale computing. Supercomputers cost signi ̄cantly more per GFLOPS and GByte than their low-end counterparts. For example, it is estimated that total cost of future large-scale ASCI machines with 10's of thousands of nodes is greater than $1,000 per GFLOPS. This factor of a 1000:1 in cost e®ectiveness is paradoxical: it should be possible to reap economies of scale with computing, just as in other major acquisitions. Although scalability has long been a focus of computer science research, it has not been transferred into practical commercial systems. Now more than ever we need to build the technological infrastructure to cost-e®ectively scale computation. In addition to being cost ine±cient, contemporary high-end computers, constructed from clusters of workstations or servers, do not deliver their promised performance. They achieve a small fraction of peak performance on many key applications that are dominated by global communication. Critical calculations, such as verifying nuclear weapons, performing signal intelligence, calculating the dynamics of protein folding, and °uid °ow through complex turbomachinary, do not map well to these machines. The performance of the microprocessors from which these clusters are composed is no longer scaling at the historic rate of 50% per year. Microprocessors have reached a point of diminishing returns in terms of gates per clock and clocks per instruction. As we enter an era of billion transistor chips, there is not enough explicit parallelism in conventional programs to e±ciently use these resources. For example, a modern graphics processor has at least 64 °oating point ALUS and 1000's of integer ALUs, almost a hundred times the arithmetic density of a microprocessor. In contrast, most of the chip area in a microprocessor is devoted to cache memory or the support infrastructure (e.g. supporting out-oforder execution) to keep a few ALUS running at their peak clock rate. It is expected that without new innovations in parallel processor designs, microprocessor performance will only increase with the increase in gate speed, at a rate of about 20% per year. Such as change would have a major e®ect on the computer business, and the entire economy. Cluster supercomputers, like the microprocessors they are constructed from, are ine±cient because they are poorly matched to the technology from which they are constructed and the applications which they run. They are unable to e±ciently exploit the large numbers of °oating-point units that can be fabricated on a chip. They also have low global bandwidth and have register and cache architectures that do not capture large amounts of application locality and hence make excessive demands on this bandwidth. Because these systems are not well-designed, they are di±cult to program. Programmers spend all their time working around the limitations of the machine, rather than on developing e±cient algorithms for their application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GraSP: Distributed Streaming Graph Partitioning

This paper presents a distributed, streaming graph partitioner, Graph Streaming Partitioner (GraSP), which makes partition decisions as each vertex is read from memory, simulating an online algorithm that must process nodes as they arrive. GraSP is a lightweight high-performance computing (HPC) library implemented in MPI, designed to be easily substituted for existing HPC partitioners such as P...

متن کامل

A simple and novel method for acoustic streaming power measurement of ultrasonic horn

Ultrasonic horn with transfer of acoustic wave into an aqueous solution results in unique properties. When, transfer of sound wave into a liquid results in liquid movement in the direction of wave propagation which gradually loses its energy due to the viscous friction. This wave motion induces a flow which is known as acoustic streaming or micro-streaming. In this article, a simple innovative ...

متن کامل

Modelling and Scheduling Lot Streaming Flexible Flow Lines

Although lot streaming scheduling is an active research field, lot streaming flexible flow lines problems have received far less attention than classical flow shops. This paper deals with scheduling jobs in lot streaming flexible flow line problems. The paper mathematically formulates the problem by a mixed integer linear programming model. This model solves small instances to optimality. Moreo...

متن کامل

Hybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage

In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...

متن کامل

Virtual reality movies-real-time streaming of 3D objects

Powerful servers for computation and storage, high-speed networking resources, and high-performance 3D graphics workstation, which are typically available in scientific research environments, potentially allow the development and productive application of advanced distributed high-quality multimedia concepts. Several bottlenecks, often caused by inefficient design and software implementation of...

متن کامل

Real-time 3d Graphics Streaming Using Mpeg-4

In this paper, we consider a real-time MPEG-4 streaming architecture to facilitate remote visualization of large scale 3D models on thin clients, which denote most of the hand-held devices that have limited computing resources. MPEG-4 serves as a key component to handle the compression, transmission, and visualization of the high-end supercomputer rendered image sequence, allowing the synchroni...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

A Streaming Supercomputer

نویسنده

چکیده

منابع مشابه

GraSP: Distributed Streaming Graph Partitioning

A simple and novel method for acoustic streaming power measurement of ultrasonic horn

Modelling and Scheduling Lot Streaming Flexible Flow Lines

Hybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage

Virtual reality movies-real-time streaming of 3D objects

Real-time 3d Graphics Streaming Using Mpeg-4

عنوان ژورنال:

اشتراک گذاری